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Abstract 


Many goal-oriented dialog agents are ex- 
pected to identify slot-value pairs in a 
spoken query, then perform lookup in 
a knowledge base to complete the task. 
When the agent encounters unknown slot- 
values, it may ask the user to repeat or re- 
formulate the query. But a robust agent 
can proactively seek new knowledge from 
a user, to help reduce subsequent task fail- 
ures. In this paper, we propose knowledge 
acquisition strategies for a dialog agent 
and show their effectiveness. The acquired 
knowledge can be shown to subsequently 
contribute to task completion. 


1 Introduction 


Many spoken dialog agents are designed to per- 
form specific tasks in a specified domain e.g., in- 
formation about public events in a city. To carry 
out its task, an agent parses an input utterance, fills 
in slot-value pairs, then completes the task. Some- 
times, information on these slot-value pairs may 
not be available in its knowledge base. In such 
cases, typically the agent categorizes utterances as 
non-understanding errors. Ideally the incident is 
recorded and the missing knowledge is incorpo- 
rated into the system with a developer’s assistance 
— a slow offline process. 

There are other sources of knowledge: automat- 
ically crawling the web, as done by NELL [Carl- 
son et al., 2010], and community knowledge 
bases such as Freebase [Bollacker et al., 2008]. 
These approaches provide globally popular slot- 
values [Araki, 2012] and high-level semantic con- 
texts [Pappu and Rudnicky, 2013]. Despite their 
size, these knowledge bases may not contain in- 
formation about the entities in a specific target 
domain. However, users in the agent’s domain 
can potentially provide specific information on 


slot/values that are unavailable on the web, e.g., 
regarding a recent interest/hobby of the user’s 
friend. Lasecki et al. [2013] have elicited natu- 
ral language dialogs from humans to build NLU 
models for the agent and Bigham et al. [2010] 
have elicited answers to visual questions by in- 
tegrating users into the system. One observation 
from this work is that both users and non-users 
can impart useful knowledge to system. In this 
paper we propose spoken language strategies that 
allow an agent to elicit new slot-value pairs from 
its own user population to extend its knowledge 
base. Open-domain knowledge may be elicited 
through text-based questionnaires from non-users 
of the system, but in a situated interaction scenario 
spoken strategies may be more effective. We ad- 
dress the following research questions: 


1. Can an agent elicit reliable knowledge about 
its domain from users? Particularly knowl- 
edge it cannot locate elsewhere (e.g., on-line 
knowledge bases). Is the collective knowl- 
edge of the users sufficient to allow the agent 
to augment its knowledge through interactive 
means? 


2. What strategies elicit useful knowledge from 
users? Based on previous work in com- 
mon sense knowledge acquisition [Von Ahn, 
2006, Singh et al., 2002, Witbrock et al., 
2003], we devise spoken language strategies 
that allow the system to solicit information by 
presenting concrete situations and by asking 
user-centric questions. 


We address these questions in the context of the 
EVENTSPEAK dialog system, an agent that provides 
information about seminars and talks in an aca- 
demic environment. This paper is organized as 
follows. In Section 2, we discuss knowledge ac- 
quisition strategies. In Section 3, we describe a 
user study on these strategies. Then, we present 
an evaluation on system acquired knowledge and 
finally we make concluding remarks. 


Table 1: System initiated strategies used by the agent for knowledge acquisition in the EVENTSPEAK system. 


StrategyType Strategy Example Prompt 
QUERYDRIVEN QUERYEVENT I know events on campus. What do you want to know? 
QUERY PERSON I know some of the researchers on campus.Whom do you want to know about? 
BUZZWORDS What are some of the popular phrases in your research? 
PERSONAL ‘i 
FAMOUSPEOPLE | Tell me some well-known people in your research area 
TWEET How would you describe this talk in a sentence, say a tweet. 
SHOW&ASK KEYWORDS Give keywords for this talk in your own words. 
PEOPLE Do you know anyone who might be interested in this talk? 


2 Knowledge Acquisition Strategies 


We posit three different circumstances that can 
trigger knowledge acquisition behavior: (1) initi- 
ated by expert users of the system [Holzapfel et al., 
2008, Spexard et al., 2006, Liitkebohle et al., 2009, 
Rudnicky et al., 2010], (2) triggered by “misun- 
derstanding” of the user’s input [Chung et al., 
2003, Filisko and Seneff, 2005, Prasad et al., 2012, 
Pappu et al., 2014], or (3) triggered by the system. 
They are described below: 

QUERYDRIVEN. The system prompts a user 
with an open-ended question akin to “how-may-I- 
help-you” to learn what “values” of a slot are of 
interest to the user. This strategy does not ground 
user about system’s knowledge limitations. How- 
ever, it allows the system to acquire information 
(slot-value pairs) from user’s input. The system 
can choose to respond to the input or ignore the 
input depending on its knowledge about the slot- 
value pairs in the input. Table 1 shows strategies 
of this kind i.e., QUERYEVENT and QUERYPERSON. 

PERSONAL. The system asks a user about their 
own interests and people who may share those in- 
terests. This is an open-ended request as well, but 
the system expects the response to be confined to 
the user’s knowledge about specific entities in the 
environment. BUZZWORDS and FAMOUSPEOPLE €x- 
pects the user to provide values for the slots. 

SHOW&ASK. The system provides a descrip- 
tion of an event and asks questions to ground 
user’s responses in relation to that event. E.g., 
given the title and abstract of a technical talk, 
the system asks the user questions about the talk. 
TWEET Strategy is expected to elicit a concise de- 
scription of the event, which eventually may help 
the agent to both summarize events for other users 
and identify keywords for an event. KEYWORDS 
strategy expects the user to explicitly supply key- 
words for an event. PEOPLE strategy expects the 
user to provide names of likely event participants. 

We hypothesized that these strategies may allow 
the agent to learn new slot-value pairs that may 


help towards better task performance. 


3 Knowledge Acquisition Study 


We conducted a user study to determine reliability 
of the information acquired by the system. We per- 
formed this study using the EvenrSpeax! dialog 
system, which provides information about upcom- 
ing talks and other events that might be of inter- 
est, and about ongoing research on campus. The 
system presents material on a screen and accepts 
spoken input, in a context similar to a kiosk. 

The study evaluated performance of the seven 
strategies described above. For SHow&Ask strate- 
gies, we had users respond regarding a specific 
event. We used descriptions of research talks col- 
lected from the university’s website. We used a 
web-based interface for data collection; the inter- 
face presented the prompt material and recorded 
the subject’s voice response. Testvox” was used 
to setup the experiments and Wami? for audio 
recording. 


3.1 User Study Design 


We recruited 40 researchers (graduate students) 
from the School of Computer Science, at Carnegie 
Mellon, representative of the user population for 
the EvENTSPEAK dialog system. Each subject re- 
sponded to prompts from the QUERYDRIVEN, PER- 
SONAL and SHOW&ASK Strategies. 

In the QUERYDRIVEN tasks, the QUERYEVENT 
strategy, the system responds to the user’s query 
with a list of talks. The user’s response is 
recorded, then sent to an open-vocabulary speech 
recognizer; the result is used as a query to a 
database of talks. The results are then displayed on 
the screen. The system applies the QUERYPERSON 
strategy in a similar way. In the Personar tasks, 
the system applies the Buzzworps strategy to ask 
the user about popular keyphrases in their research 


‘http://www.speech.cs.cmu.edu/apappu/kacq 
*https://bitbucket.org/happyalu/testvox/wiki/Home 
$https://code.google.com/p/wami-recorder/ 


Figure 1: Time per Task for all strategies 
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area. The system then asks about well-known re- 
searchers (FAMOUSPEOPLE) in the user’s area. 

In the SHow&Ask tasks, we use two seminar 
descriptions per subject (in our pilot study, we 
found that people provide more diverse responses 
(in term of entities) in the SHow&Ask based on 
the event abstract, compared to PERSONAL, QUERY- 
DRIVEN). We used a set of 80 research talk an- 
nouncements (consisting of a title, abstract and 
other information). For each talk, the system used 
all three strategies viz., TWEET, KEYWORDS and PEO- 
PLE. For the Tweet tasks, subjects were asked to 
provide a one sentence description. They were al- 
lowed to give a non-technical/high-level descrip- 
tion if they were unfamiliar with the topic. For 
the PEopLE task, subjects had to give names of col- 
leagues who might be interested in the talk. For 
the Keyworps task, subjects provided keywords, 
either their own words or ones selected from the 
abstract. 

Since the material is highly technical, we were 
interested whether the tasks are cognitively de- 
manding for people who are less familiar with the 
subject of a talk. Therefore, users were asked to 
indicate their familiarity with a particular talk (re- 
search area in general) using a scale of 1—4: 4 be- 
ing more familiar and 1 being less familiar. 


3.2 Corpus Description 


This user study produced 64 minutes of audio data, 
on average 1.6 minutes per subject. We tran- 
scribed the speech then annotated the corpus for 
people names, and for research interests. Table 2 
shows the number of unique slot-values found in 
the corpus. We observe that the number of unique 
research interests produced during SHOW&ASK is 
higher than for other strategies. This confirms 


Figure 2: Time per Task vs Expertise 
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our initial observations that this strategy elicits 
diverse responses. The PERSONAL task produced 
a relatively higher number of researcher names 
(FAMOUSPEOPLE Strategy) than other tasks. One ex- 
planation might be that people may find it easier 
to recall names in their own research area, as com- 
pared to other areas. Overall, we identified 139 
unique researcher names and 485 interests. 


Table 2: Corpus Statistics 


Unique Unique 
Strategy Type Researcher | Research 

Names Interests 
QUERYDRIVEN 21 30 
PERSONAL 77 107 
SHOW&ASK 76 390 
Overall 139 485 


3.3 Corpus Analysis 


One of the objectives of this work is to determine 
What strategies can the agent use to elicit knowl- 
edge from users? Although, time-cost will vary 
with task and domain, a usable strategy should, in 
general, be less demanding. We analyzed the time- 
per-task for each strategy, shown in Figure 1. We 
found that the Tweet strategy is not only more de- 
manding, it has higher variance than other tasks. 
One explanation is that people would attempt to 
summarize the entire abstract including technical 
details, despite the instructions indicated that a 
non-technical description was acceptable. We can 
see a similar trend in Figure 2 that irrespective 
of expertise-level, subjects take more time to give 
one sentence descriptions. We also observe high 
variance and higher time-per-task for QUERYPER- 
son; this is due to the system deliberately not re- 
turning any results for this task. This was done to 


Table 3: Mean Precision for 200 researchers, broken down by the “source” strategy used to acquire their name 
Note: Only 85 of 200 researchers had Google Scholar pages, GScholar Accuracy is computed for only those 85. 


Metric Description Text || SHOW&ASK | PERSONAL | QUERYDRIVEN | mean 
Mean Precision 89.5% 86.9% 93.6% 86.2% 90.5% 
GScholar Acc. 78.3% 82.3% 86.1% 100% 80.0% 


find out whether subjects would repeat the task on 
failure. Ideally the system needs to only rarely use 
this strategy to not lose user’s trust and solicit mul- 
tiple values for a given slot (e.g., person name) as 
opposed to requesting list of values as in FAMous- 
PEOPLE and PEOPLE Strategies. We find that PEOPLE, 
KEYWORDS, FAMOUSPEOPLE and BUZZWORDS strate- 
gies are efficient with a time-per-task of less than 
one minute. As shown in Figure 2, subjects do not 
take much time to speak a list of names or key- 
words. 


4 Evaluation of Acquired Knowledge 


To answer Can an agent elicit reliable knowl- 
edge about its domain from users? we analyzed 
the relevance of acquired knowledge. We have 
two disjoint list of entities, (a) researchers and 
(b) research interests; in addition we have speaker 
names from the talk descriptions. Our goal is 
to implicitly infer a list of interests for each re- 
searcher without soliciting the user for the inter- 
ests of every researcher exhaustively. To each re- 
searcher in the list, we attribute list of interests that 
were mentioned in the same context as researcher 
was mentioned. We tag list of names acquired 
from the FAMouSPEOPLE strategy with list of key- 
words acquired from the BuzzWorps strategy — 
both lists acquired from same user. We repeat this 
process for each name mentioned in relation to a 
talk in the SHow&AskK Strategy. We tag keywords 
mentioned in the KEyworps strategy to researchers 
mentioned in the PEopLE strategy. 


4.1 Analysis 


We produced 200 entries for researchers and their 
set of interests. We then had two annotators (se- 
nior graduate students) mark whether the system- 
predicted interests were relevant/accurate. The an- 
notators were allowed to use information found on 
researchers’ home pages and Google Scholar* to 
evaluate the system-predicted interests. 

This can be seen as an information retrieval (IR) 
problem, where researcher is “query” and interests 
are “documents”. So, we use Mean Precision, a 


‘scholar.google.com 


common metric in IR, to evaluate retrieval. In our 
case, the ground truth for relevant interests comes 
from the annotators. The results are shown in Ta- 
ble 3. Our approach has high precision, 90.5%, 
for all 200 researchers. We see that irrespective 
of the strategy used to acquire entities, precision 
is good. We also compared our predicted inter- 
ests with interests listed by researchers themselves 
on Google Scholar. There are only 85 researchers 
from our list with a Google Scholar page; for these 
our accuracy is 80%, again good. Moreover, sig- 
nificant knowledge is absent from the web (at least 
in our domain) yet can be elicited from users fa- 
miliar with the domain. 


5 Conclusion 


We describe a set of knowledge acquisition strate- 
gies that allow a system to solicit novel informa- 
tion from users in a situated environment. To in- 
vestigate the usability of these strategies, we con- 
ducted a user study in the domain of research talks. 
We analyzed a corpus of system-acquired knowl- 
edge and have made the material available’. Our 
data show that users on average take less than a 
minute to provide new information using the pro- 
posed elicitation strategies. The reliability of ac- 
quired knowledge in predicting relationships be- 
tween researchers and interests is quite good, with 
a mean precision of 90.5%. We note that the Per- 
SONAL Strategy, which tries to tap personal knowl- 
edge, appears to be particularly effective. More 
generally, automated elicitation appears to be a 
promising technique for continuous learning in 
spoken dialog systems. 


6 Appendix 


System Predicted Researcher-Interests 1 
rich stern deep neural networks, speech recog- 


nition, signal processing, neural networks, machine 
learning, speech synthesis 


> www.speech.cs.cmu.edu/apappu/pubdl/eventspeak_corpus.zip 


System Predicted Researcher-Interests 2 
kishore prahallad dialogue systems, prosody, 
speech synthesis, text to speech, pronunciation mod- 
eling, low resource languages 


System Predicted Researcher-Interests 3 
carolyn rose crowdsourcing, meta discourse clas- 
sification, statistical analysis, presentation skills in- 
struction, man made system, education models, human 
learning 


______ System Predicted Researcher-Interests 4 

florian metze dialogue systems, speech recogni- 
tion, nlp, prosody, speech synthesis, text to speech, 
pronunciation modeling, low resource languages, au- 
tomatic accent identification 


System Predicted Researcher-Interests 5 
madhavi ganapathiraju protein structure, contin- 
uous graphical models, generative models, structural 
biology, protein structure dynamics, molecular dy- 
namics 


______ System Predicted Researcher-Interests 6 
alexander hauptmann discriminatively trained 
models, deep learning, computer vision, big data 


_____ System Predicted Researcher-Interests 7 

jamie callan learning to rank, search, large scale 
search, web search, click prediction, information re- 
trieval, web mining, user activity, recommendation, 
relevance, machine learning, web crawling, distributed 
systems, structural similarity 


______ System Predicted Researcher-Interests 8 

lori levin natural language understanding, knowl- 
edge reasoning, construction grammar, knowledge 
bases, natural language processing 
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